ZEPPELIN-289: User can now enter custom expressions in notebooks' input fields#320
ZEPPELIN-289: User can now enter custom expressions in notebooks' input fields#320rolmovel wants to merge 2 commits intoapache:masterfrom
Conversation
…ut fields. Expression will be evaluated server-side by Zeppelin before being sent to the interpreter.
|
Looks interesting, thank you for contributing! Please help me to understand, am I right that these changes potentially affect all interpreter's syntax and code-wise are not localised to your particular use-case with spark sql? |
|
Hi @bzz, |
|
@lucarosellini thanks for the explanation! @rolmovel Could you merge latest master in to resolve conflicts as well as update |
|
@rolmovel If this PRs is still needed, can we try to rebase it? |
|
This looks to be an unique and important feature to have, will be great to have this in Zeppelin |
close #83 close #86 close #125 close #133 close #139 close #146 close #193 close #203 close #246 close #262 close #264 close #273 close #291 close #299 close #320 close #347 close #389 close #413 close #423 close #543 close #560 close #658 close #670 close #728 close #765 close #777 close #782 close #783 close #812 close #822 close #841 close #843 close #878 close #884 close #918 close #989 close #1076 close #1135 close #1187 close #1231 close #1304 close #1316 close #1361 close #1385 close #1390 close #1414 close #1422 close #1425 close #1447 close #1458 close #1466 close #1485 close #1492 close #1495 close #1497 close #1536 close #1545 close #1561 close #1577 close #1600 close #1603 close #1678 close #1695 close #1739 close #1748 close #1765 close #1767 close #1776 close #1783 close #1799
Actually, with Zeppelin we can use Spark SQL UDFs perfectly fine.
We developed a custom UDF library that parses absolute and relative dates. Feeding this library into Spark SQL using the standard UDF mechanism is suboptimal, since each UDF call is repeated for each row of the queried table.
Example:
This repeats the call to parseDate(...) for every single row of 'my_table'.
Even worse, if we filter for a date range like in:
the call to parseDate(...) is performed twice for each row in the table.
Since Spark's UDFs do not have a concept of 'execution context' we were not able to overcome the problem.
We implemented a mechanism of UDF evaluation in Zeppelin, before the query parameters are sent to the interpreter. Parametrizing queries as usual in Zeppelin, in Zeppelin's input forms you can now enter expressions like:
or:
this is similar to how standard SQL works, where parameters are evaluated before being sent to the execution engine.
You can find more info in the org.apache.zeppelin.display.Evaluator javadoc.
The above mentioned query over a table of 1 million records lasts about 1 minute. Applying this PR the execution time is reduced to 15 seconds.